Skip to content

Let BackgroundProcessor drive HTLC forwarding #3891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

tnull
Copy link
Contributor

@tnull tnull commented Jun 25, 2025

Closes #3768.
Closes #1101.

Previously, we'd require the user to manually call process_pending_htlc_forwards as part of PendingHTLCsForwardable event handling. Here, we rather move this responsibility to BackgroundProcessor, which simplifies the flow and allows us to implement reasonable forwarding delays on our side rather than delegating to users' implementations.

Note this also introduces batching rounds rather than calling process_pending_htlc_forwards individually for each PendingHTLCsForwardable event, which had been unintuitive anyways, as subsequent PendingHTLCsForwardable could lead to overlapping batch intervals, resulting in the shortest timespan 'winning' every time, as process_pending_htlc_forwards would of course handle all pending HTLCs at once.

To this end, we implement random sampling of batch delays from a log-normal distribution with a mean of 50ms and drop the PendingHTLCsForwardable event.

Draft for now as I'm still cleaning up the code base as part of the final commit dropping PendingHTLCsForwardable.

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Jun 25, 2025

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@tnull tnull marked this pull request as draft June 25, 2025 15:12
@joostjager
Copy link
Contributor

joostjager commented Jun 25, 2025

Does this in any way limit users to not have delays or not have batching? Assuming that's what they want.

@tnull
Copy link
Contributor Author

tnull commented Jun 25, 2025

Does this in any way limit users to not have delays or not have batching? Assuming that's what they want.

On the contrary actually: it effectively reduces the (mean and min forwarding) delay quite a bit, which we can allow as we're gonna add larger receiver-side delays in the next step. And, while it get's rid of the event, users are still free to call process_pending_htlc_forwards on a faster schedule if they really want to. IMO, this should result in a win-win situation: substantially reduced forwarding delays on average and by default, while still considerably improving receiver anonymity.

@tnull tnull force-pushed the 2025-06-batch-forwarding-delays branch from ceb3335 to 9ba691c Compare June 26, 2025 08:13
@joostjager
Copy link
Contributor

Isn't it the case that without the event, as a user you are forced to "poll" for forwards, making extra delays unavoidable?

@tnull
Copy link
Contributor Author

tnull commented Jun 26, 2025

Isn't it the case that without the event, as a user you are forced to "poll" for forwards, making extra delays unavoidable?

LDK always processes HTLCs in batches (note that process_pending_htlcs never allowed to just forward a single HTLC, for good reason). Having some batching delay makes a lot of sense in any scenario. And given that 'polling' is really cheap, users could consider doing that frequently. But, they really shouldn't try to skip the batching entirely as IO overhead/delay would come to bite them (especially on more busy forwarding nodes), and of course since they should be 'good citizens' providing some privacy by default for the network.

@joostjager
Copy link
Contributor

Polling may be cheap, but forcing users to poll when there is an event mechanism available, is that really the right choice? Perhaps the event is beneficial for testing, debugging and monitoring too?

@tnull
Copy link
Contributor Author

tnull commented Jun 26, 2025

Polling may be cheap, but forcing users to poll when there is an event mechanism available, is that really the right choice? Perhaps the event is beneficial for testing, debugging and monitoring too?

The event never featured any information so is not helpful for debugging or 'informational' purposes. Plus, it means at least 1-2 more rounds of ChannelManager persistence, just to queue and remove the event. So since we don't need it anymore, we should def. drop it in production. As you know I was on the fence whether to drop it for testing, but now went this way, especially given that nobody indicated a strong opinion either way. If we indeed want to introspect the holding cell during testing (or, e.g., in fuzzing), we should add another approach to do it, but that's up for discussion.

@joostjager
Copy link
Contributor

joostjager commented Jun 26, 2025

But at least the event could wake up the background processor, where as now nothing is waking it up for forwards and the user is forced to call into channel manager at a high frequency? Not sure if there is a lighter way to wake up the bp without persistence involved.

Also if you have to call into channel manager always anyway, aren't there more events/notifiers that can be dropped?

As you know I was on the fence whether to drop it for testing, but now went this way, especially given that nobody indicated a strong opinion either way.

I may have missed this deciding moment.

If the assertions were useless to begin with, no problem dropping them of course. I can imagine though that at some points, a peek into the pending htlc state is still required to not reduce the coverage of the tests?

@tnull
Copy link
Contributor Author

tnull commented Jun 26, 2025

But at least the event could wake up the background processor, where as now nothing is waking it up for forwards and the user is forced to call into channel manager at a high frequency? Not sure if there is a lighter way to wake up the bp without persistence involved.

Also if you have to call into channel manager always anyway, aren't there more events/notifiers that can be dropped?

As you know I was on the fence whether to drop it for testing, but now went this way, especially given that nobody indicated a strong opinion either way.

I may have missed this deciding moment.

Again, the default behavior we had intended to switch to for quite some time is to introduce batching intervals (especially given that the current event-based approach was essentially broken/race-y). This is what is implemented here. If users want to bend the recommended/default approach they are free to do so, but I don't think it makes sense to keep all the legacy codepaths, including persistence overhead, around if it's not used anymore.

If the assertions were useless to begin with, no problem dropping them of course. I can imagine though that at some points, a peek into the pending htlc state is still required to not reduce the coverage of the tests?

I don't think this is generally the case, no. The 'assertion' that is mainly dropped is 'we generated an event', every thing else remains the same.

@tnull tnull force-pushed the 2025-06-batch-forwarding-delays branch from 9ba691c to b38c19e Compare June 26, 2025 09:49
@joostjager
Copy link
Contributor

Again, the default behavior we had intended to switch to for quite some time is to introduce batching intervals (especially given that the current event-based approach was essentially broken/race-y). This is what is implemented here. If users want to bend the recommended/default approach they are free to do so, but I don't think it makes sense to keep all the legacy codepaths, including persistence overhead, around if it's not used anymore.

This doesn't rule out a notification when there's something to forward, to at least not keep spinning when there's nothing to do?

@tnull tnull force-pushed the 2025-06-batch-forwarding-delays branch from c1a0b35 to d35c944 Compare June 26, 2025 13:17
@tnull tnull self-assigned this Jun 26, 2025
@tnull tnull force-pushed the 2025-06-batch-forwarding-delays branch from d35c944 to c21aeab Compare June 27, 2025 09:29
@tnull tnull requested a review from TheBlueMatt June 27, 2025 09:29
@tnull tnull marked this pull request as ready for review June 27, 2025 09:29
@tnull
Copy link
Contributor Author

tnull commented Jun 27, 2025

Finished for now with the test refactoring post-dropping PendingHTLCsForwardable event. This should be good for a first round of (concept) review. Whether or not we should add a notifier on top is up for debate.

@tnull tnull removed the request for review from TheBlueMatt June 27, 2025 09:36
@tnull tnull moved this to Goal: Merge in Weekly Goals Jun 27, 2025
@ldk-reviews-bot
Copy link

✅ Added second reviewer: @valentinewallace

@tnull tnull requested review from TheBlueMatt and removed request for TheBlueMatt June 27, 2025 09:51
@@ -360,12 +376,24 @@ macro_rules! define_run_body {
break;
}

if $timer_elapsed(&mut last_forwards_processing_call, cur_batch_delay) {
$channel_manager.get_cm().process_pending_htlc_forwards();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked a bit closer at this function. There is a lot of logic in there. Also various locks obtained.

@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 2nd Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@tnull tnull force-pushed the 2025-06-batch-forwarding-delays branch from c21aeab to e2ad6ca Compare July 2, 2025 09:55
@tnull
Copy link
Contributor Author

tnull commented Jul 14, 2025

Now added a commit that adds a simple is_pending_htlc_processing helper that we use to skip waking on the batch delay and calling into process_pending_htlcs_forwards.

@tnull tnull force-pushed the 2025-06-batch-forwarding-delays branch from 64cb370 to d7387bf Compare July 14, 2025 11:34
pub fn is_pending_htlc_processing(&self) -> bool {
let has_forward_htlcs = !self.forward_htlcs.lock().unwrap().is_empty();
let has_decode_update_add_htlcs = !self.decode_update_add_htlcs.lock().unwrap().is_empty();
let has_outbound_needing_abandon = self.pending_outbound_payments.needs_abandon();
Copy link
Contributor Author

@tnull tnull Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pre-existing, but aren't we also pending processing if we have auto-retryable outbound payments, not only if they need abandoning (i.e., what we then process via check_retry_payments)? Not sure if I'm missing something here? (cc @valentinewallace)

Copy link
Contributor

@valentinewallace valentinewallace Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe on main we push a forwardable event when an HTLC fails, thus triggering the check_retry_payments call when the event is handled. We might need to add a method to outbound_payments to see whether any payments are is_auto_retryable_now().

@tnull tnull force-pushed the 2025-06-batch-forwarding-delays branch 3 times, most recently from 4519c80 to e4142d6 Compare July 14, 2025 14:00
tnull added 6 commits July 14, 2025 16:26
.. as `forward_htlcs` now does the same thing
    .. as `fail_htlcs_backwards_internal` now does the same thing
We move the code into the `optionally_notify` closure, but maintain the
behavior for now. In the next step, we'll use this to make sure we only
repersist when necessary.

Best reviewed via `git diff --ignore-all-space`
We skip repersisting `ChannelManager` when nothing is actually
processed.
We add a reenatrancy guard to disallow entering
`process_pending_htlc_forwards` multiple times. This makes sure that
we'd skip any additional processing calls if a prior round/batch of
processing is still underway.
@tnull tnull force-pushed the 2025-06-batch-forwarding-delays branch from e4142d6 to dac90af Compare July 14, 2025 14:26
@valentinewallace
Copy link
Contributor

Can we squash? I'm also tempted to ask you to break up this PR 😅 seems the complexity has grown since it's at +3000 now

Comment on lines +34 to +49
const FALLBACK_DELAY: u16 = 50;
let delay;

#[cfg(feature = "std")]
{
const USIZE_LEN: usize = core::mem::size_of::<usize>();
let mut random_bytes = [0u8; USIZE_LEN];
possiblyrandom::getpossiblyrandom(&mut random_bytes);

let index = usize::from_be_bytes(random_bytes) % FWD_DELAYS_MILLIS.len();
delay = *FWD_DELAYS_MILLIS.get(index).unwrap_or(&FALLBACK_DELAY)
}
#[cfg(not(feature = "std"))]
{
delay = FALLBACK_DELAY
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document why we have different behavior in no-std?

@@ -6358,13 +6357,37 @@ where
}
}

/// Returns whether we have pending HTLC forwards that need to be processed via
/// [`Self::process_pending_htlc_forwards`].
pub fn is_pending_htlc_processing(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is_pending_htlc_processing/needs_pending_htlc_processing? Right now to me it's ambiguous whether it means we're currently processing htlcs.

/// Handles a PendingHTLCsForwardable and HTLCHandlingFailed event
macro_rules! expect_pending_htlcs_forwardable_and_htlc_handling_failed {
/// Processes any HTLC forwards and handles an expected [`Event::HTLCHandlingFailed`].
macro_rules! process_htlcs_and_expect_htlc_handling_failed {
Copy link
Contributor

@valentinewallace valentinewallace Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have the method that checks if forwarding is needed, can we revert some of the diff in tests and change expect_pending_htlcs_forwardable to call ChannelManager::is_pending_htlc_processing? Seems like strictly better test coverage and easier review. Matt got me overthinking the test coverage changes...

$htlc.prev_hop.counterparty_node_id;
let channel_id = prev_channel_id;
let outpoint = prev_funding_outpoint;
let htlc_id = $htlc.prev_hop.htlc_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I requested this and also prefer only pulling out some vars

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Goal: Merge
Development

Successfully merging this pull request may close these issues.

Revisit PendingHTLCsForwardable delay duration Randomize PendingHTLCsForwardable::time_forwardable internally
6 participants